What is the True Normal Human Body Temperature?

Background

The mean normal body temperature was held to be 37$^{\circ}$C or 98.6$^{\circ}$F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. But, is this value statistically correct?

Exercises

In this exercise, you will analyze a dataset of human body temperatures and employ the concepts of hypothesis testing, confidence intervals, and statistical significance.

Answer the following questions in this notebook below and submit to your Github account.

Is the distribution of body temperatures normal?
- Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.
Is the sample size large? Are the observations independent?
- Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.
Is the true population mean really 98.6 degrees F?
- Would you use a one-sample or two-sample test? Why?
- In this situation, is it appropriate to use the $t$ or $z$ statistic?
- Now try using the other test. How is the result be different? Why?
Draw a small sample of size 10 from the data and repeat both tests.
- Which one is the correct one to use?
- What do you notice? What does this tell you about the difference in application of the $t$ and $z$ statistic?
At what temperature should we consider someone's temperature to be "abnormal"?
- Start by computing the margin of error and confidence interval.
Is there a significant difference between males and females in normal temperature?
- What test did you use and why?
- Write a story with your conclusion in the context of the original problem.

You can include written notes in notebook cells using Markdown:

In the control panel at the top, choose Cell > Cell Type > Markdown
Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet

Resources

Information and data sources: http://www.amstat.org/publications/jse/datasets/normtemp.txt, http://www.amstat.org/publications/jse/jse_data_archive.htm
Markdown syntax: http://nestacms.com/docs/creating-content/markdown-cheat-sheet



In [ ]:

    
import pandas as pd
import numpy as np

df = pd.read_csv('data/human_body_temperature.csv')



In [ ]:

    
df.info()
df.head()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130 entries, 0 to 129
Data columns (total 3 columns):
temperature    130 non-null float64
gender         130 non-null object
heart_rate     130 non-null float64
dtypes: float64(2), object(1)
memory usage: 3.1+ KB






    Out[ ]:







  
    
      
      temperature
      gender
      heart_rate
    
  
  
    
      0
      99.3
      F
      68.0
    
    
      1
      98.4
      F
      81.0
    
    
      2
      97.8
      M
      73.0
    
    
      3
      99.2
      F
      66.0
    
    
      4
      98.0
      F
      73.0

1) Is the distribution of body temperatures normal?

Although this is not a requirement for CLT to hold (read CLT carefully), it gives us some peace of mind that the population may also be normally distributed if we assume that this sample is representative of the population.

We first start by viewing the histogram of the human body temperatures.



In [ ]:

    
# Plots the histogram of temperatures
import matplotlib.pyplot as plt
import seaborn as sns

temperature = df['temperature']

sns.set()
plt.hist(temperature, bins='auto', normed=True)
plt.xlabel('Temperature(F)')
plt.ylabel('Count')
plt.title('Human Body Temperature')
plt.show()

It is difficult to conclude whether this data is normally distributed from this histogram alone. A better visual would be made by using the empirical CDF and CDF of the temperature data.



In [ ]:

    
# Plots the ECDF and CDF of the human body temperatures
def ecdf(data):
    """
    Compute ECDF for a one-dimensional array of measurements.
    Returns tuple of arrays (x,y) that contain x and y values for ECDF.
    """
    x = np.sort(data)
    y = np.arange(1, len(x) + 1) / len(x)
    
    return x, y

x_ecdf, y_ecdf = ecdf(temperature)

temperature_theoretical = np.random.normal(np.mean(temperature), np.std(temperature), size=10000)

x_theoretical_cdf, y_theoretical_cdf = ecdf(temperature_theoretical)

plt.plot(x_ecdf, y_ecdf, marker='.', linestyle='none')
plt.plot(x_theoretical_cdf, y_theoretical_cdf)
plt.xlabel('Temperature(F)')
plt.ylabel('CDF')
plt.title('Human Body Temperature')
plt.legend(('CDF', 'ECDF'), loc='lower right')
plt.show()

The ECDF and CDF on the graph above seem to allign together implying that the temperature data is likely normally distributed. We can perform a normal test to double check.



In [ ]:

    
# Performs normal test
import scipy.stats as stats

def isNormal(data):
    z, p = stats.mstats.normaltest(data)
    
    if (p < 0.055):
        print('The data is more likely NOT normally distributed')
    else:
        print('The data is more likely normally distributed')
        
isNormal(temperature)









    



The data is more likely normally distributed

2) Is the sample size large? Are the observations independent?

Remember that this is a condition for the CLT, and hence the statistical tests we are using, to apply.

The rule of thumb for the Central Limit Theorem is that a sample size of 30 or more is considered a large sample size. The sample size is large since we have a sample size of 130. The observations are also independent.

3) Is the true population mean really 98.6 degrees F?

Would you use a one-sample or two-sample test? Why?
In this situation, is it appropriate to use the $t$ or $z$ statistic?
Now try using the other test. How is the result be different? Why?

A one-sample test should be used here because we only have one set of data available that we will compare to single mean. It is appropriate to use the $z$ statistic in this case because the sample size is 30 or greater. The $t$ statistic should be used if the sample size is less than 30.



In [ ]:

    
df.describe()









    Out[ ]:







  
    
      
      temperature
      heart_rate
    
  
  
    
      count
      130.000000
      130.000000
    
    
      mean
      98.249231
      73.761538
    
    
      std
      0.733183
      7.062077
    
    
      min
      96.300000
      57.000000
    
    
      25%
      97.800000
      69.000000
    
    
      50%
      98.300000
      74.000000
    
    
      75%
      98.700000
      79.000000
    
    
      max
      100.800000
      89.000000

We will now perform a bootstrap hypothesis test with the following:

$H_0$: The mean of the sample and the true mean of 98.6 are the same. $\mu=\mu_0$

$H_A$: The means are different. $\mu\neq\mu_0$



In [ ]:

    
# Calculates p value using 100,000 boostrap replicates
bootstrap_replicates = np.empty(100000)

size = len(bootstrap_replicates)

for i in range(size):
    bootstrap_sample = np.random.choice(temperature, size=len(temperature))
    bootstrap_replicates[i] = np.mean(bootstrap_sample)

p = np.sum(bootstrap_replicates >= 98.6) / len(bootstrap_replicates)
print('p =', p)

The p value is extremely small after 100,000 replicates. This implies that the true mean is different from 98.6 degrees F

We can repeate the hypothesis test by also calculating the z-score to verify our results above.

$z\_score$ = $(sample\_mean - population\_mean)$ / $population\_standard\_deviation$

Since we do not know the population's standard deviation we can approximate it to be:

$population\_standard\_deviation$ $\approx$ $sample\_standard\_deviation$ / $sample\_size^{0.5}$



In [ ]:

    
# Calculates z and p values and performs z test
z = (np.mean(temperature) - 98.6) / (np.std(temperature) / np.sqrt(len(temperature)))
print('z =', z)

p_z = stats.norm.sf(abs(z))*2
print('p = p(z >= 5.476) + p(z <= -5.476) =', p_z)









    



z = -5.47592520208
p = p(z >= 5.476) + p(z <= -5.476) = 4.35231516588e-08

The p value is extremely small which confirms that the true mean is likely different from 98.6. We will compare the results with the t statistic. The $t$ and $z$ values should be approximately the same.



In [ ]:

    
# Performs t test
t = z
print('t =', t)

p_t = stats.t.sf(np.abs(t), len(temperature)-1)*2
print('p = p(t >= 5.476) + p(t <= -5.476) =', p_t)









    



t = -5.47592520208
p = p(t >= 5.476) + p(t <= -5.476) = 2.18874646241e-07

The p value from the $t$ test is different but it still implies that the null hypothesis is false.

4) Draw a small sample of size 10 from the data and repeat both tests.

Which one is the correct one to use?
What do you notice? What does this tell you about the difference in application of the $t$ and $z$ statistic?

Since we will be drawing a random sample of size 10, the $t$ statistic will not be more appropriate to use.



In [ ]:

    
# Draws random sample of 10
sample = np.random.choice(temperature, size=10)

sample









    Out[ ]:





array([ 98. ,  98.3,  98. ,  97.1,  98.3,  96.3,  98. ,  98.4,  97.5,  98.4])



In [ ]:

    
# Performs t test
t2 = (np.mean(sample) - 98.6) / (np.std(sample) / np.sqrt(len(sample)))

print('t =', t2)

p_t2 = stats.t.sf(np.abs(t), len(sample)-1)*2
print('p = ', p_t2)









    



t = -3.77478192892
p =  0.000392231622671



In [ ]:

    
# Performs z test
z2 = (np.mean(sample) - 98.6) / (np.std(sample) / np.sqrt(len(sample)))

print('z =', z2)

p_z2 = stats.norm.sf(abs(z))*2
print('p =', p_z2)









    



z = -3.77478192892
p = 4.35231516588e-08

The p values for the t and z tests are significantly different. This shows that if you apply the wrong test to a problem you can end up with an incorrect result. It is important to know when it is appropriate to apply the $z$ statistic and the $t$ statistic. When the sample size is less than 30, the $t$ statistic should be used.

5) At what temperature should we consider someone's temperature to be "abnormal"?

Start by computing the margin of error and confidence interval.



In [ ]:

    
# Calculates margin of error for sample mean with 95% confidence

print('The mean temperature of the data is', np.mean(temperature))

z = 1.96 # this is the value of z for 95% confidence

error = z * np.std(temperature) / np.sqrt(len(temperature))

print('margin of error for a sample mean =', error)









    



The mean temperature of the data is 98.24923076923078
margin of error for a sample mean = 0.125550964803

The average temperatures of all humans is estimated with 95% confidence to be 98.25 +/- 0.126 or between 98.124 and 98.376 in degrees Fahrenheit. If we define an "abnormal" temperature to be outside of the range of the mean, this would include all temperatures greater than 98.376 and less than 98.124.



In [ ]:

    
# Calculates 95% confidence interval
confidence_interval = np.percentile(temperature, [2.5, 97.5])

print('We expect 95% of the temperature data to be between', confidence_interval[0], 'and', confidence_interval[1])









    



We expect 95% of the temperature data to be between 96.7225 and 99.4775

If we define an "abnormal" temperature to be outside the 95% confidence interval, this would include temperatures greater than 99.478 and less than 96.723.

6) Is there a significant difference between males and females in normal temperature?

What test did you use and why?
Write a story with your conclusion in the context of the original problem.

A two-sample permutation test with the differences in means will be appropriate for this problem. A permuatation test is appropriate for this because we will be testing whether males and females have the same distribution and similar mean temperatures. First we should visualize the data with exploratory data analysis.



In [ ]:

    
# Plots the ECDF for the temperatures of males and females

male_temperature = df[df['gender'] == 'M']['temperature']
female_temperature = df[df['gender'] == 'F']['temperature']

x_male, y_male = ecdf(male_temperature)
x_female, y_female = ecdf(female_temperature)

plt.plot(x_male, y_male, marker='.', linestyle='none', color='red')
plt.plot(x_male, y_male, marker='.', linestyle='none', color='blue')
plt.xlabel('Temperature(F)')
plt.ylabel('ECDF')
plt.legend(('Male', 'Female'), loc='lower right')
plt.title('Male vs Female: Human Body Temperature')
plt.show()

male_and_female_diff = np.abs(np.mean(male_temperature) - np.mean(female_temperature))
print('The difference between the male and female mean temperatures is', male_and_female_diff)









    












    



The difference between the male and female mean temperatures is 0.289230769231

We can see that the male and female ECDF graphs overlap which tells us that there is a small difference between the two data sets to begin with (0.289). We can now continue with hypothesis testing to see if this difference is due to the differenes in gender or by chance.

$H_0$: There is no difference in the distribution and means of males and females.

$H_A$: There is a difference in the distribution and means of males and females.



In [ ]:

    
permutation_replicates = np.empty(100000)

size = len(permutation_replicates)

for i in range(size):
    combined_perm_temperatures = np.random.permutation(np.concatenate((male_temperature, female_temperature)))

    male_permutation = combined_perm_temperatures[:len(male_temperature)]
    female_permutation = combined_perm_temperatures[len(male_temperature):]

    permutation_replicates[i] = np.abs(np.mean(male_permutation) - np.mean(female_permutation))
    
p_val = np.sum(permutation_replicates >= male_and_female_diff) / len(permutation_replicates)

print('p =', p_val)









    



p = 0.02462

The small p value is less than 0.055 which shows that the difference in the means of male and female temperatures is statistically significant. We can reject the null hypothesis ($H_0$).

Conclusion

The mean normal body temperature was held to be 37$^∘$ C or 98.6$^∘$ F for more than 120 years since it was first conceptualized and reported by Carl Wunderlich in a famous 1868 book. However, this value is not statistically correct. The mean normal body temperature was computed with 95% confidence to be between 98.124 and 98.376. There is also a statistically significant difference in the means between males and females.

	temperature	heart_rate
count	130.000000	130.000000
mean	98.249231	73.761538
std	0.733183	7.062077
min	96.300000	57.000000
25%	97.800000	69.000000
50%	98.300000	74.000000
75%	98.700000	79.000000
max	100.800000	89.000000